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ABSTRACT 

A sample of slightly over 1,500 students drawn from 
even-numbered grades in public schools of the U.S. Virgin Islands 
(USVI) were given the 1973 edition of the Stanford Achievement Test 
(grades 2, 4, 6, and 8) and the Test of Academic Skills (grades 10 
and 12) in an attempt to assess student academic achievement in the 
basic skill areas of mathematics, reading, and English language. This 
report summarizes the results of this study which were reported in 
detail in three previous technical reports. The tests were found to 
be content valid and reliable when administered to students in USVI 
public schools, even though the tests were designed for curricula 
used in the continental United States and were standardized using a 
sample of examinees attending continental U.S. schools. Item analysis 
revealed differences between the local and standardization samples 
based on the cognitive complexity of items on all subtests of the 
batteries which were administered. There were also indications of 
effects of local dialects on responses to the language subtests at 
all levels. Finally, the data indicated that most students were 
unable to complete the reading comprehension subtests in the standard 
tine allotted. (Author/LMO) 
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Abstract 



A sample of slightly over 1500 students was drawn from even 
numbered grades in public schools of the U.S. Virgin Islands and 
these students were given the 1973 edition of the Stanford 
Achievement Test (in grades 2, 4, 6, & 6) and the Test of 
Academic Skills (grades 10 and 12) in an attempt to assess stu- 
dent academic achievement in the basic skill areas of 
mathematics, reading, and English language. This report 
summarizes the results of this study which were reported in 
detail in three previous technical reports. 

The tests were found to be content valid and reliable when 
administered to students in USVI public schools even though the 
tests were designed for curricula used in schools in the 
continental United States and were standardized using a sample of 
examinees attending continental U.S. schools. 

Item analysis revealed differences between the local and 
standardization samples based on the cognitive complexity of 
items on all subtests of the batteries which were administered. 
There were also indications of effects of local dialects on 
responses to the language subtests at all levels. Finally, the 
data indicated that most students were unable to complete the 
reading comprehension subtests in the standard time allotted. 
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It has almost become a matter of faith that achievement in 
basic skills (i.e. English language and mathematics) in public 
schools under the American flag has deteriorated over the past 
twenty years. Proponents of this idea point to evidence as 
formal as decreases in typical scores on the Scholastic Aptitude 
Test and standardized tests of academic achievement and as 
informal as the writing and aritliraetic skills they perceive in 
the young people around them. The National Commission on 
Excellence in Education has told us, "Our nation is at Risk. Our 
once unchallenged preeminence in commerce, industry, science, and 
technological innovation is being overtaken by competitors 
throughout the world. . . .If an unfriendly foreign power had 
attempted to impose on America the mediocre educational 
performance that exists today, we might have well viewed it as an 
act of war" (1983, p. 5). Boyer (1983), in a report commissioned 
for the Carnegie Foundaton, echoes this concern, if in a somewhat 
less alarmist tone. 

The reactions of people to this perceived phenomenon are also 
varied. On the government level they include the requirement 
that all students score a minimum grade on a test of basic skills 
in order to receive a high school diploma; that teachers pass a 
similar test to obtain teacher certification; and that schools 
require students to take additional course work in basic skills 
areas. In addition, federal, state, and local governments have 
initiated programs to provide support in the forms of grants and 
technical assistance to schools at all levels to do research and 
set up programs designed to improve student achievement in b'asic 
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skills areas. 

At a different level, parents, concerned that public schools 
are -not doing an adequate job in preparing their children in the 
basic skills, are choosing, in increasing numbers, to remove 
their children from public schools and are placing them in 
religious and secular private schools. While there are other 
reasons for this proliferation of private schools besides the 
purely academic, the desire for high quality academic preparation 
is one compelling cause of this phenomenon. 

The public schools, themselves, have reacted strongly to this 
crisis in public confidence. These reactions include an increase 
in required courses in the language and mathematics areas with a 
corresponding decrease in electives in areas considered less 
"basic." Projects to revise curricula in basic skills areas 
proliferate and are receiving more support than they have 
received since the reevaluation of American education engendered 
by the shock of Sputnik in the late 1950' s. 

Improving • basic skills achievement was a concern of the 
Department of Education of the government of the Virgin Islands 
of the United States when it approached the College of the Virgin 
Islands to provide aid in improving such instruction. In an 
effort to provide this service, the Caribbean Research Institute, 
the college's research arm, worked with a task force composed of 
representatives from the Department of Education and CRI to 

determine a course of action. 

It became clear after the first few task force meetings that 
the development of any strategy designed to improve basic skills 
achievement needed to start off with a fairly detailed 
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description of current achievement levels of students in 
territorial schools. This information was not available since the 
school system had no program of standardized testing in place. 
Various published achievement tests from the continental United 
States were administered from time to time at the discretion of 
building principals, but the test used and the time of testing 
were at the whim of these administrators and the records kept of 
these results were rather haphazard. The Iowa Test of Basic 
Skills was administered, system wide, to sixth graders, but some 
building administrators often refused to have these tests 
administered in their schools and there were years, due to fiscal 
constraints, when the tests were not given, at all. Even the 
scores obtained were of little use since they were reported out 
in norm referenced form based on U.S. national and local norms 
and provided no information as to the particular skills students 
posessed or lacked. Finally, none of the tests used had been 
validated using local students and there was a strong feeling 
that cultural and curricular differences between mainland U.S. 
and U.S. Virgin Islands students and schools rendered the 
reliability and validity of these scores questionable. There 
were no standardized tests of academic achievement administered 
on the secondary level anywhere in the territory's public 
schools . 

This problem is not unique to the U.S. Virgin Islands. The 
widespread use of standardized achievement testing in the English 
speaking Caribbean has posed a series of problems for educators 
in this area of the world. Not the least of these problems 
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involves the reliability of scores obtained from students to whoir, 
these tests are administered. 

During the colonial period, curriculum was imported, as a 
more or less complete package, from the mother country, complete 
with form examinations which were designed and standardized over- 
seas. Local school people had little or no autonomy in terms of 
curriculum or evaluation procedures. With the coming of indepen- 
dence or increased local control of internal affairs and the 
emergence of most of these former colonies into nationhood, this 
control has disappeared and national or local ministries and 
departments of education now play the major role in determining 
the curriculum, including evaluation procedures, which will be 
used in their schools. While most of these jurisdictions still 
hold strong emotional and cultural ties to their former or 
present mother countries, there are strong pressures for their 
educational systems to move toward more independent, locally 
relevent curricula with the accountability this type of movement 
would dictate. Valid and reliable tests of achievement are a 
necessary component of this accountability. 

The use of standardized, commercially published achievement 
tests offers much to recommend them as instruments toward meeting 
the goal of high standard evaluation. The items on these tests 
tend to be technically superior to those found on informal tests. 
They have gone through a series of trials and revisions and have 
met standards of clarity and precision that tend to be high and 
well defined. In addition, much is known about typical 
performances of students in a particular population when they are 
administered these tests. Also, test publishers go to great 
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efforts* to determine the curriculum used in the target population 
schools and to include items which constitute a representative 
sample of the cognitive objectives in the curriculum at these 
institutions. Finally, machine scoring is usually available for 
these tests and results may be reported out either in criterion 
referenced form or norm referenced based on a reasonably well 
defined population. 

It is in these latter two areas that English speaking 
Caribbean ;3urisdictions find major difficulties. Published stan- 
dardized tests of achievement are standardized using non- 
Caribbean peoples. This is not supprising considering the small 
populaton of the area and the resulting small markets for such 
tests. The costs of producing high quality tests are extremely 
high and publishers must look toward large markets when planning 
new tests and revisions of prexisting examinations. As a result 
of this, it is hard for people involved in decision making at 
Caribbean ministries and departments of education to be confident 
that a given test or series of tests evaluates a representative 
sample of the objectives in their curricula, i.e. that the test 
is content valid. 

Additionally, while the items used on standardized 
achievement tests seem to function well for examinees in the 
population from which the standardization sample was drawn, there 
may be some justifiable concern about whether or not these items 
will function in a similar manner when administered to Caribbean 
students since these students were not part of the population 
from which the standardization samples were drawn (generally 
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populations of students in the continental United States, the 
United Kingdom, or Canada). Given the possibility ol cultural 
factors affecting test taking performance, CaribDean decision 
makers might well be justified in being hesitant to accept the 
results of standardization procedures reported by the publishers 
of these tests. 

An obvious solution to this this dilemma is for each, ministry 
and department of education to develop and standardize a set of 
achievement tests appropriate for the content of its curriculum 
and the test taking characteristics of its students, keeping in 
mind the need to create alternative forms of these tests and the 
need to update them, periodically. While this may initially 
appear to be a workable procedure, the costs in time and money 
are probably beyond the resources of most of these educational 
systems. In addition, the technical expertise required to carry 
out such an effort would most likely be unavailable locally and 
the importing of persons from the outside prohibitively expensive 
and undesirable in other ways. 

A second solution would be convincing commercial test 
publishers to produce achievement tests which were content valid 
for local curricula and standardized on the local population of 
students. It seems unlikely that this effort would bear any 
fruit based on geogaphical considerations and the relatively 
small market that would be available for such tests. 

The U.S. Virgin Islands researchers deve loped a third 
alternative to solving the problem of obtaining valid and 
reliable scores on achievement tests that might also be 
applicable to other areas of the English speaking Caribbean. 



This alternative is based on the assumption that while there may 
have been changes in curriculum with the development of greater 
internal control or political independence^ , most Caribbean 
states retain much of the educational structure that existed 
during the period of greater outside control due to strong emo- 
tional and cultural ties to the larger country and the usual 
strong conservative predisposition of educational systems in most 
democratic countries. These are not unreasonable assumptions and 
the latter is supported by the facts that most high level school 
officials involved in decision making capacities at ministries 
and departments of education in the Caribbean received all or 
part of their training in American, British, or Canadian colleges 
and universities, and that most of the texts used in the English 
speaking Caribbean are published in these three countries. The 
latter is particularly important in light of the fact that, in 
most classrooms, the content of the curriculum is determined 
primarily from the content of the text book or books being used. 

under these assumptions, the alternative proposes that 
published standardized tests used and standardized in school 
systems similar to those in question be examined, first to deter- 
mine the content validity of these tests given the curriculum in 
the local school system, and then to establish the reliability of 
scores and the test taking behaviors of a representative sample 
of local students in order to determine the appropriateness of 
the chosen test. Finally, if the test appears to function well 
for the population of local students, adjustments to the test 
items and/or test taking instructions can be made based on the 
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results of the local validation study before the test is placed 
in use on a system wide basis. 

Based on these assumptions, the task lorce decided to test 
the appropriateness of an achieve«ent test of basic skills de- 
vised and standardized in the continental United States when it 
was administered to students in the U.S. Virgin Islands public 
schools. The Virgin Islands of the United States is an unincor- 
porated territory of the U.S. comprising some 50 islands and cays 
in the Caribbean Sea. The two largest islands, St. Thomas and 
St. Croix, are separated by 40 miles. The island of St. John 
lies three miles east of St. Thomas. The total land mass of the 
territory is 132 square miles. Only the three largest islands 
have a sizable permanent population, estimated at about 120,000. 
This is augmented by a transient population of almost one million 
tourists "each year. Of the permanent population, approximately 
80^ are of West Indian heritage, either having been born in the 
U.S. Virgin Islands or having immigrated froa other islands in 
the Lesser Antilles. St. Croix has a significant Hispanic popu- 
lation, originally from Puerto Rico and its smaller islands of 
Vieques and Calebra. The official language of the territory is 
English with many persons speaking a patois derived from English, 
Dutch, and French at home and in informal circumstances. 

The K-12 populaton of the public schools is approximately 
25,000 with education being compulsory from age six to sixteen. 
Standard English is the language of instruction. The population 
of students attending USVI public school is primarily West Indian 
and Hispanic. Approximately 945^ of students attending are 
entitled to free lunch under the U.S. Department of Agriculture's 
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school lunch program. The vast maoority of residents from the 
continental U.S. and other middle socioeconomic status lamilies 
send their children to one oi the many private day schools in the 
territory. 

Although it is separated from the U.S. mainland by 1100 miles 
of ocean, the USVI is hardly isolated. The three local 
television stations broadcast network television (including PBS) 
and television stations from Atlanta, Chicago, and New York are 
available on cable television. New York and Miami newspapers are 
available on a same day basis and U.S. magazines are available on 
a regular basis. Nor is the school system curricularly isolated. 
Most basic skills curriculum is imported, intact, from the 
continental United States. Reading is taught using the Ginn 720 
series and mathematics using the series published by Silver- 
Burdette, Co., for instance. English grammar is taught using the 
time honored series by Warriner and literature texts published by 
mainland U.S. publishers are used in both elementary and 
secondary schools. 

Teachers in the public school system tend to have been 
trained primarily at the College of the Virgin Islands, the 
territory's public land grant college, or at mainland U.S. 
colleges and schools of education. The former provides a 
standard U.S. college curriculum with a traditional program of 

teacher education. 

Given these similarities in curriculum and teacher 
preparation, and the degree of communication with the continental 
U.S., the task force agreed to choose, attempt to validate, and 
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use a standardized test of basic skills which had been published 
in the mainland U.S. and used in mainland U.S. schools. 
The Instrument 

In choosing a test battery to be validated on U.S. Virgin 
Islands students, the following criteria were used; 

1) The test needed to be technically sound in terms of 
reliability and item descrimination, at least for the 
population which it had been standardized on. 

2) The test needed to be content valid for U.S. Virgin 
Islands public school students. That is, the test needed to 
contain items which tested a representative sample of the 
content and behaviors actually taught at various levels in 
the USVI public schools. 

3) The test publishers needed to include a detailed 
descript-on of the objectives tested while providing an item 
by objective keying procedure. 

4) Scores which indicated students' performances relative to 
each objective needed to be available. That is, criterion 
referenced scoring was a requirement. 

The 1973 version of the Stanford Achievement Test and the 
Test of Academic Skills was chosen as the test battery which best 
appeared to meet the above criteria. 
Validation Procedure 

Sampling The June 1, 1980 enrollment in the public schools 
in the Virgin Islands of the United States was 25.426 according 
to the statistics issued by the USVI Department of Education. It 
was clear that testing this number of students was economically 
unfeasible. The preferred alternative would have been to 
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generate a random sample of students in grades K-12 to be tested, 
but it was equally clear that this would have produced an 
intolerable disruption of classroom activities. Therefore, in an 
attempt to obtain a representative sample of students, cluster 
sampling was used with the clusters being defined as classes. 
The number of classes to be selected for the sample from each 
grade in each of the St. Thqmas/St. John and the St. Croix school- 
districts was determined by calculating the proportion of the 
total K-12 student population in each grade in each district and 
assuming a class size of 30 in the elementary schools and 27 in 
the secondary schools. 

Selecting whole classes presented an additional difficulty. 
The small number of classes selected in each grade might have 
made obtaining a representative sample of students more 
difficult. This is due to the fact that while classes in a given 
elementary school may be heterogeneous, the schools themselves 
are not. This is because elementary schools in the U.S. Virgin 
Islands are esseatially neighborhood schools. Virgin Islands 
neighborhoods tend to be homogeneous in terms of socioeconomic 
status of residents. To overcome this problem, it was decided to 
increase the number of classes tested in a given grade (thereby 
increasing the number of schools within the territory from which 
these classes came) without increasing the total number of 
students tested by testing at alternate grades. This seemed 
acceptable since many of the objectives tested by the Stanford 
Achievement Test carry across adjacent levels of the test and 
there was no reason to suspect that the patterns of academic 
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achievement of students in even numbered grades were different 
from those in odd numbered grades. Classes in grades 2, 4, 6, 8, 
10, and 12 were given the test. Bliss (1982a) describes this 
procedure in detail and comments on the effects of sampling 
classes rather than individual students. Table 1 presents the 
number of students tested at each grade level in each school 
district. 



Table 1 

U.S. Virgin Isands Sample Sizes 

Test Total STT/STJ STX 

Grade Level System District District 



12 


TASK II 


129 


74 


55 


10 


TASK I 


254 


167 


87 


8 


Advanced 


345 


173 


172 


6 


Intermediate II 


227 


146 


81 


4 


Primary III 


346 


186 


160 


2 


Primary I 


234 


143 


91 




TOTALS 


1535 


889 


646 



Table 2 presents a description of each of the batteries of 
the Stanford Achievement Test which was administered at each 
grade level. 
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Table 2 

Stanford Subtests Administered 
at Each Grade Level 



Grade Subtest Number of Items 




Grade 12 

(TASK II Level) 



Grade 10 
(TASK I Level) 



Grade 8 

(Advanced Level) 



Grade 6 
(Intermediate 

Level ) 



Grade 4 
(Primary 



Reading 

Mathematics 

English 

Reading 

Mathematics 

English 

Vocabulary 

Reading Comprehension 
Mathematics Concepts 
Mathematics Computation 
Mathematics Applications 
Spelling 
Language 

Vocabulary 

Reading Comprehension 
Mathematics Concepts 
Mathematics Computation 
Mathematics Applications 
Spelling 

Word Study Skills 
Language 



Vocabulary 
III Level) Reading Comprehension 
Word Study Skills 
Mathematics Concepts 
Mathematics Computation 
Mathematics Applications 
Spelling 
Language 



II 



Grade 2 

(Primary I Level) 



Vocabulary 

Reading Comprehenson 
Word Study Skills 
Mathematics Concepts 
Mathematics Computations 
Listening Comprehension 



14 



16 



78 
48 

69 

78 
48 
69 

50 
74 
35 
45 
40 
60 
80 

50 

71 
35 

45 
40 
60 
50 
80 

45 
70 
55 

52 
36 

28 

47 

55 

37 

87 
60 

32 
32 
26 
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Testin g Pr ocedures Testing was done at the grade level 
recommended by the test publisher. This was done primarily to 
insure the content validity of the examinations. Tests were 
administered by classroom teachers or guidance counselors, at the 
discretion of building administrators. Each person who was to 
administer tests attended a two hour training session at either 
the College of the Virgin Islands St. Thomas or St. Croix 
campuses. During this time the purpose of the testing was 
explained, the test and instruction manual were reviewed, a 
testing schedule was distributed and reviewed, and testing 
materials were distributed. These included a practice test to be 
given to students in grades 2, 4, and 6 the day prior to the 
first day of testing in order to give these students experience 
in reading and answering items on this type of test. Answer 
sheets were sent off island to be machine scored. 

Content validity The content validity of the various levels 
of the Stanford Achievement Test was determined using the 

following strategies: 

1) Collection of written curriculum guides used in the 
public schools. The objectives explicitly stated or 
implicitly inferred in these documents were compared with 
the lists of objectives tested provided by the test 
publisher . 

2) Text books used in the teaching of basic skills subject 
matter were collected from selected schools. Stated and 
implicit objectives in these texts were compared with the 
test publisher's objectives. 

3) The test objectives were shown to elementary and 
O 15 
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secondary subject area supervisors who were asked to 
determine the degree of match between those objectives and 
what was taught at the indicated grade levels. 

4) Selected building principals in St. Thomas were asked to 
review the objectives of the test and give their opinions 
concerning the degree of match between these objectives and 
the objectives taught toward in their schools. 

5) Teachers who administered the tests in their classrooms 
were asked to review the test publisher's objectives and to 
determine the degree of match between these objectives 
and^ the basic skills they expected their students to have 
obtained . 

Using these techniques, the researchers were satisfied that 
the test did, indeed, test a sample of objectives that was 
consistent with the objectives used in teaching in the public 
schools of the Virgin Islands of the United States. 

Reliability Kuder-Richar dson 20 estimates of internal 
consistency were calculated for each test of each battery for the 
entire USVI sample and the subsamples from each of the two school 
districts. It was noted that, in most cases, the variances of 
the raw scores obtained by the USVI sample of students were 
considerably lower than those reported for the standardization 
group. This is not an uncommon phenomenon and is commonly found 
when testing samples drawn from populations composed largely of 
persons from low socioeconomic status homes. Since the 
reliability of a test is partially dependent on the 
heterogeneity of the scores obtained (the greater the spread of 
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scores, the higher the reliability), the local scores were 
adjusted for homogeneity using a procedure described by Allen and 
Yen (1979). See Bliss (1982a) for details of this procedure. 

There needed to be a criterion for making decisions regarding 
the acceptability of the adjusted reliability estimates. The 
Stanford Achievement Test is considered to have more than 
acceptable reliability when administered to the population of 
examinees upon which it was standardized (i.e. continental U.S. 
students). Among the indications of this are numerous reviews of 
the test in the literature (Kasdon, 1974; Lehraann, 1975; Chase, 
1978; Ebel, 1978; Thorndike, 1978) and the fact that it is widely 
used in the schools. However, the literature is replete with 
studies which indicate that standardized tests of academic 
achievement tend to produce less reliable scores when 
administered to students from low socioeconomic status homes and 
to those who are culturally different from the majority of those 
on whom the test was normed (see reviews and discussions in 
Anastasi, 1958; Tyler, 1956; and Deutsch, I960). Therefore, if 
the reliability estimates obtained from a sample of U.S. Virgin 
Islands students who took the Stanford Achievement Test are at 
least equal to the reliability estimates obtained from the 
standardization samples, it is reasonable to conclude that the 
test scores are reliable indicators of academic achievement for 
these students. 

For each adjusted reliability estimate obtained from the USVI 
sample, a reliability difference was found by subtracting the 
standardization group's reliability estimate from the local group 
reliability estimate. The median reliability difference across 



all tests for all grades was -.002 with a range from -.06 to +.02 
with the distribution somewhat negatively skewed. When Z 
transformations were used to normalize the distribution of the 
reliability estimates, t-tests revealed two subjects out of the 
total 36 examined where the local sample reliability estimates 
were significantly lower than those of the standardization group 
at the p=.05 level (see Bliss, 1982a). This is approximately 
the number that would be expected by chance. The standard errors 
of measurement (which are not affected by the variances of the 
scores) were treated in a similar manner and it was found that 
there were only three out of 36 standard error estimates which 
were significantly higher than those obtained from the standardi- 
zation sample. 

Item discrimination The item discrimination index indicates 
the degree to which responses on one item are related to 
responses on other items of the test. The statistic indicates 
whether a person who does well on the test as a whole (that is, a 
person who is presumably high on the trait being measured) is 
more likely to get a particular item correct than a person who 
does poorly on the test as a whole. In other words, the item 
discrimination index indicates whether an item discriminates 
between those who do well and do poorly on the test as a whole. 
Taking the item difficulty and the item discrimination index into 
consideration, the developers of tests desire to construct tests 
which discriminate well among examinees with varying levels of a 
trait. 
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The item discrimination index is calculated by the formula 
d = (U-L)/i\l where 

U= the number of examinees who have total test scores in the 
upper range of total test scores and who also have the item 
correct. 

the number of examinees who have total test scores in the 
lower range of total test scores and who also have the item 
correct • 

N= the number of examinees in the upper or lower range of 
the test scores. 

By definition, d is the difference between the proportion of high 
scoring examinees who got the item correct and the proportion of 
low scoring examinees who got the item correct. The upper and 
lower ranges generally are defined as the upper and lower 10% to 
33% of the sample, with examinees ordered on the basis of their 
total test score* When total scores are normally distributed, 
using the upper and lower 2755 produces the best estimate of d 
(Kelly, 1939). If the distribution of total test scores is 
flatter than the normal curve, the optimum percentage is larger 
and approaches 33%. However, Allen and Yen (1979) found that, 
for most applications, any percentage between 25 and 33 will 
yield similar estimates of d. In this study, 27% was used as the 
upper and lower percentages because examination of selected dis- 
tributions of actual test scores revealed nearly normal distribu- 
tions* 

- The theoretical range of d is between -1 and +1. However, 
maximum discrimination is likely to occur when the difficulty 
index equals .50. When p=.50 the variance in item scores, which 
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is p(1-p), is maximized. As an item becomes more difficult, it 
is less likely that an^ student will score correctly on it. As 
it becomes less difficult it is more likely that an^ student will 
get it correct. This could lead to the suggestion that all items 
have p=.50, but the usefulness of this suggestion is mitigated by 
intercorrelations among Items. In an extreme case, if the items 
on a test all intercorrelated perfectly and had difficulties of 
0.50, half the examinees would receive a total test grade of zero 
and the other half would have perfect test scores. Hence, there 
would be no fine distinctions between examinee's levels of 
achievement on whatever trait is being measured. In general, 
test designers try to choose items with a range of difficuH-ies 
that average around .50. Items of particularly low difficulty 
are often included for motivational reasons. 

Item discrimination indices were calculated in this study to 
provide indications that items may be flawed when used with USVI 
students. Such flaws are ambiguity, the presence of clues, the 
presence of more than one correct answer, and other technical 
defects. If none was found upon examination of the item, and it 
was determined that the Item did, indeed, appear to measure the 
objective it was intended to, the item was included in the 
overall analysis of the results. Any item that discriminates 
positively can make a contribution to the measurement of pupil 
achievement and low indices of discrimination are frequently 
obtained for reasons other than item defects. 

Standardized achievement tests are designed to measure 
several different types of learning outcomes (e.g. knowledge. 
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understanding, application, etc.). Where this is the case, the 
test items that represent an area receiving relatively little 
emphasis will tend to have poor discriminating power. For 
example, if a test has forty items measuring knowledge of 
specific facts and ten items measuring understanding, the latter 
items can be expected to have low discrimination indices. This 
is because the items measuring understanding have less 
representation in the total test score and there is typically a 
low correlation between measures of knowledge and measures of 
understanding. Low discrimination indices here merely indicate 
that these items are measuring something different from what the 
raa;ioc part of the test is measuring. Removing such items from the 
test would make it a more homogeneous measure of knowledge out- 
comes, but it would also damage the content validity of the test 
because it would no longer measure objectives in the under- 
standing area. Since achievement test batteries need to measure 
a wide variety of objectives in a reasonably short period of 
time, they tend to be fairly heterogeneous in nature and 
moderately low discrimination indices tend to be the rule rather 

than the exception. 

To summarize, a low discrimination index alerts test users to 
the posible presence of defects in test items, but does not cause 
them to discard these items if they appear to be functioning as 
they should. A well constructed achievement test will, of 
necessity, contain items with low discriminating power and to 
discard them would result in a test which is less, rather than 
more, valid. Due to these considerations, in this study items 
were examined if they had discrimination indices lower than .20. 
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This is a rather conservative criterion sinct= items that 
discriininate as low as this lay provide useful information, but 
given the unknown test taking characteristics of USVI students, 
it was decided to be particularly cautious in the item analysis. 

For the most part, items which did not discriminate 
satisfactorily tended to be those which had extremely high or low 
difficulty indices (i.e. iteas which the local sample of students 
found either very easy or very difficult). In no case did the 
items seem ambiguous or discriminate negatively. 
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student Skills and Test Taking Behaviors 

Bliss (1982b) reports the level of specific basic skills 
attained by Virgin Islands secondary public school students in 
the sample of students tested in this study. The same 
information for public elementary school students is presented in 
a later document (Bliss, 1984). These reports provide indices of 
discriminaton for each item in each battery of the tests in all 
grades as well as difficulty indices and lists of items by 
objectives which point out areas of weakness and strength in 
basic skills areas for the sample of Virgin Islands public school 
students. Included in the reports are the criteria used to 
determine satisfactory discrimination indices and definitions of 
high, satisfactory, and poor achievement. These documents are 
available from the Caribbean Research Institute at the College of 

the Virgin Islands. 

Lev el I and Leve 1 II Obj^ectives^ A close look at the 
difficulty indices tended to disclose a consistent pattern. 
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There appeared to be a set of skills and knowledges which the 
students in the USVI sample were able to master at levels 
comparable to students in the standardization sample. From 
grades 2 through 12, the proportions of students scoring 
correctly on items testing these skills and knowledges tended to 
be as high or higher than the proportion of students in the 
standardization sample. A second set of skills and knowledges 
seemed to exist which the USVI sample of students appeared to be 
consistently less successful in mastering than the examinees in 
the standardization group. 

The Taxonomy of Educational Ob;iectives (Bloom, 1956) appears 
to provide a conceptual hook for understanding these two item 
groups. The vast raa;3ority of items in the first group appear to 
test objectives which would be classified in the lower three 
levels of the taxpnomy (i.e. knowledge, comprehension, and 
application). These include items which require students to 
spell, compute solutions to mathematical equations using simple 
algorithms, solve simple, one step mathematical problems to which 
an algorithm can be directly applied, and to determine explicit 
meaning in written passages. Most items in the second group 
appear to test objectives which could be classified in the upper 
three levels of the taxonomy (i.e. analysis, synthesis, and 
evaluation). These items require students to solve raultistep 
mathematical word problems, determine relationships, make choices 
Concerning appropriate language useage from context, and 
determine global, contextual, and inferential meaning from 
written passages. These findings seem consistent with those of 
Jensen (1968) concerning the interaction of socioeconomic status 
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and Level 1 and Level II abilities. 

Jensen noted that there were socioeconomic differences in 
students' abilities to master objectives which would be 
classified in the three higher levels of the taxonomy (Level II 
abilities) with non-middle class students having greater 
difficulty mastering these objectives than students from middle 
class homes. There were no differences between these two groups 
of students in their abilities to master objectives in the three 
lower catagories of the taxonomy (Level I abilities). Table 3 
provides a breakdown of the proportions of examinees scoring 
correctly on items testing Level I and Level II abilitities for 
subtests in the batteries given to examinees in grades 2, 6, and 
10 as illustrations of this phenomenon. The fact that most USVI 
public school children come from non-middle class homes while 
middle class students were represented in the standardization 
sample tends to support this model as an explanation for these 
findings. 
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Language In the area of language use.'ige, it was noted that 
the responses of many students were consistent with the grammar 
and syntax of the local dialect. This included the dropping of 
plurals, the confusing of the nomative and objective forms of 
pronouns with the overuse of the nomative form, the dropping of 
the indefinite article with the overuse of the definite article, 
and the dropping of past tenses of verbs. This phenomenon was 
observed across all grades and is significant since the language 
of instruction in the schools is officially standard English and 
the objectives of the school call for instruction in the use of 
standard English with absolutely no instruction in the use of 
dialect. 

Om itted response s of th e reading test Finally, an 
examination of the proportions of omitted responses for each item 
indicated that for all subtests except reading comprehension, 
examinees had sufficient time to attempt all items when the time 
recommended by the test publisher was allowed for completion of 
the subtest. On the reading comprehension subtest, it was noted 
that examinees in grades 6 through 12 showed proportions of omits 
which tended to increase steadily after about the twentieth item 
on each test with more than 50% of the examinees omitting the 
last 15 or so items on the tests. Figure 1 shows this phenomenon 
graphically for grade 8 examinees. 

A number of explanations for this phenomenon are being 
considered. The first of these suggests that the students in the 
local sample are more deliberate readers than their counterparts 
in the continental United States. They, therefore, read more 
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Figure 1 

Proportions of Omitted Responses on the Readi: , Cc7r:?rchcnsion Test 

For Grade 8 Examiness 




slowly and do not skim passages to be read. A second is that 
local examinees have a lower attention span and simply stop 
taking the test after a certain point and before the end of the 
testing period. Both of these are rendered plausible by the fact 
that this phenomenon is not observed in grade 2 where the ques- 
tions are asked orally by the examiner and the test is divided 
into two periods, a day apart. Equally noteworthy is the fact 
that the phenomenon appeared only in the second half of the 
fourth grade reading test. The first half of the test consisted 
of a series of independent short answer iteas while the second 
consisted of the passage reading type used in the higher grades. 



Summation 

This document summarizes a series of reports on basic skills 
achievement in the public schools of the U.S. Virgin Islands. 
The series clearly documents the weakness and strengths of 
students in these schools and should be read as a totality before 
actions are taken. 

The plain truth of the matter is that in most basic skills 
areas, U.S. Virgin Islands students are not greatly different 
from their contintental U.S. counterparts and are, in fact, 
superior in such Level I skills as spelling and some areas of 
mathematics computation. Where they fall short seems to be in 
the area of Level II skills which require a series of operations 
and cannot be carried out by the memorizing of simple algorithms. 
Since the Stanford Achievement Test has shown itself to be a 
valid indicator of these skills and there is no reason to suspect 
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that U.S. Virgin Islands students are inately inferior In 
academic ability to students in the continental United States, 
the issues involved appear to be those oi instruction. A number 
of suggestions for improving instruction in high level skills in 
the basic areas are presented below: 

1) In order to teach higher level objectives, teachers have to 
have achieved these objectives, themselves. A program of testing 
for teachers and/or teacher applicants for new positions should 
be set in place in order to ascertain the proficiency of these 
educators in dealing with these higher level tasks. The results 
of this testing should not be used as criteria for hiring and 
firing, but rather as indications that certain teachers and/or 
teacher applicants require additional inservice work in order 
that they might develop these skills. The Department of Educa- 
tion should set up a progam in which teachers in need of these 
services might receive this training. The College of the Virgin 
Islands is an excellent vehicle for such training. 

2) curricula should be examined to determine whether all academic 
areas are teaching toward these higher level objectives. While 
the few written curricula located by the researchers seemed to 
indicate that these objectives are part of the curriculum and 
texts used in these classes seem to indicate that this is going 
on, more specific guidelines need to be established. 

3) No curriculum is teacher proof! While curricula and text 
books seem to indicate that higher level objectives are being 
taught, there is no evidence that classroom practice actually 
fosters the development of these objectives. Supervisors need to 
visit classrooms regularly and consult with teachers to make 
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certain that higher level objectives are Deing taught. In 
addition, examinations should test these objectives. Students 
soon learn that the important things to learn are the things 
which are tested. Teachers should make special efforts to 
objectively test higher level objectives on their classroom 
tests. Such objectives can be tested using objective items and 
the skills needed for constructing such tests can be taught 
during inservice workshops and at courses at the College of the 
Virgin Islands. Again, this is a matter of supervision. 
Supervisors must be sure that this teaching and testing is going 
on in the classrooms for which they have responsibility and 
should have the authority to assure that teachers who do not deal 
with these objectives are provided with the training necossary to 
allow them to do this as well as the authority to remove teachers 
who fail to accomplish this from the classroom. 

Finally, a brief call for educational excellence. Our students 
will be competing in a more and more complex world. While it is 
fine to know how to spell and to use proper syntax and grammar, 
it is at least equally important that these same students have 
something intelligent to say. 
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